Data Visualization Project 02

This project explores and visualizes two datasets:

Florida Lakes Shapefile (spatial data)

Atlanta 2019 Daily Weather Data (CSV format)

The assignment goal was to use different data visualization techniques—including spatial, interactive, and model-based visualizations—to uncover trends and communicate insights effectively.

Initially, I aimed to create the following types of visualizations:

Spatial Map: Displaying water bodies in Florida using shapefiles from Natural Earth.

Interactive Plot: Using leaflet to build an interactive map of lakes in Florida, where clicking reveals the lake names.

Model-Based Visualization: Fitting a linear regression model to weather data to analyze how humidity and dew point affect daily high temperatures, and visualizing the model’s coefficients.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.4.3
library(sf)
## Warning: package 'sf' was built under R version 4.4.3
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
fl_lakes <- st_read("data/Florida_Lakes/Florida_Lakes.shp")
## Reading layer `Florida_Lakes' from data source 
##   `C:\Users\se08m\OneDrive\Documents\GitHub\dataviz_final_project\project-02\data\Florida_Lakes\Florida_Lakes.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 4243 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.42774 ymin: 25.02625 xmax: -80.03097 ymax: 31.00254
## Geodetic CRS:  WGS 84
ggplot(fl_lakes) +
  geom_sf(fill = "lightblue", color = "blue") +
  theme_minimal() +
  labs(title = "Lakes in Florida", caption = "Source: Florida_Lakes.shp")

leaflet(fl_lakes) %>%
  addTiles() %>%
  addPolygons(color = "blue", weight = 1, fillOpacity = 0.5,
              popup = ~NAME)
weather <- read_csv("data/atl-weather.csv")
## Rows: 365 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (3): summary, icon, precipType
## dbl  (29): moonPhase, precipIntensity, precipIntensityMax, precipProbability...
## dttm  (8): time, sunriseTime, sunsetTime, precipIntensityMaxTime, temperatur...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(broom)

weather <- weather %>%
  mutate(date = as.Date(time))

weather_model <- lm(temperatureHigh ~ humidity + dewPoint, data = weather)

summary(weather_model)
## 
## Call:
## lm(formula = temperatureHigh ~ humidity + dewPoint, data = weather)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9074 -1.4603 -0.1428  1.6896  8.7372 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  57.48236    0.77107   74.55   <2e-16 ***
## humidity    -60.01841    1.27171  -47.20   <2e-16 ***
## dewPoint      1.07709    0.01115   96.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.977 on 362 degrees of freedom
## Multiple R-squared:  0.9627, Adjusted R-squared:  0.9625 
## F-statistic:  4674 on 2 and 362 DF,  p-value: < 2.2e-16
# Visualize coefficients using broom
tidy(weather_model) %>%
  ggplot(aes(x = reorder(term, estimate), y = estimate)) +
  geom_col(fill = "#0072B2") +
  geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
  coord_flip() +
  labs(title = "Linear Model Coefficients",
       x = "Predictor",
       y = "Estimated Effect on High Temperature (°F)") +
  theme_minimal()

ggplot(weather, aes(x = dewPoint, y = temperatureHigh)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "Relationship Between Dew Point and High Temperature",
       x = "Dew Point (°F)",
       y = "High Temperature (°F)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

  1. Spatial Visualization Using leaflet, I created an interactive map of Florida’s lakes. The map allows users to pan and zoom across the state, clicking on lakes to reveal their names. This approach helps users engage directly with geographic data.

Florida contains thousands of lakes, and interactive maps are effective in localizing and personalizing geographic data without overwhelming the viewer.

  1. Linear Model Analysis I modeled:

Dew point has a strong positive relationship with high temperature (1.08°F increase per °F of dew point).

Humidity has a negative association, possibly due to overlapping thermal properties or atmospheric saturation effects.

R² = 0.9627, indicating the model explains 96.27% of the variation in daily high temperatures.

Coefficients Plot (with error bars) visually summarized the strength and direction of each predictor.

While dew point directly correlates with heat, humidity may inversely signal conditions with less temperature variation.

  1. Interactive Plot Using leaflet, I rendered:

A basemap with addTiles()

Polygons of lakes with popups showing names

This fulfills the interactive requirement and makes sharing/embedding easier.

The leaflet function initially failed due to trying to apply st_coordinates() to the full leaflet object. I corrected this by instead passing the shapefile directly to addPolygons() and referencing attributes properly for the popup.

Date/time columns in the weather data required careful parsing due to mixed formats (dttm, chr).

Creating a readable coefficients plot required using broom::tidy() and flipping the coordinate system with coord_flip() for better visibility.

Clarity & Simplicity: Each chart includes clear labels, consistent color palettes, and direct visual mappings.

Annotation: The regression model plot included error bars to reflect uncertainty.

Engagement: Used interactivity to allow users to explore Florida’s lake data on their own terms.

Reproducibility: The project uses an RStudio Project structure and .Rmd files so the analysis can be rerun and verified.

With more time, I could: Animate temporal trends in weather using gganimate. Explore time-series decomposition of temperatures using tsibble. Include more spatial overlays, like city boundaries or heat maps of temperature deviations